Complexity Estimation of Genetic Sequences Using Information-Theoretic and Frequency Analysis Methods
نویسنده
چکیده
The genetic information in cells is stored in DNA sequences, represented by a string of four letters, each corresponding to a definite type of nucleotides. Genomic DNA sequences are very abundant in periodic patterns, which play important biological roles. The complexity of genetic sequences can be estimated using the information-theoretic methods. Low complexity regions are of particular interest to genome researchers, because they indicate to sequence repeats and patterns. In this paper, the complexity of genetic sequences is estimated using Shannon entropy, Rényi entropy and relative Kolmogorov complexity. The structural complexity based on periodicities is analyzed using the autocorrelation function and time delayed mutual information. As a case study, we analyze human 22nd chromosome and identify 3 and 49 bp periodicities.
منابع مشابه
Estimation of Moisture in Transformer Insulation Using Dielectric Frequency Response Analysis by Heuristic Algorithms
Transformers are one of the most valuable assets of power systems. Maintenance and condition assessment of transformers has become one of the concerns of researchers due to huge number of transformers has been approached to the end of their lifetimes. Transformer’s lifetime depends on the life of its insulation and the insulation’s life is strongly influenced by its moisture attraction as well....
متن کاملImprovement of effort estimation accuracy in software projects using a feature selection approach
In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملEstimation of genetic diversity in rice (Oryza sativa L.) genotypes using SSR markers under salinity stress . Fatemeh Gholizadeh1* and Saeed Navabpour2
In order to study the genetic diversity in rice (Oryza sativa L.), 29 genotypes consisting land races, pure and improved lines were evaluated using simple sequence repeat (SSR) markers. A total of 30 SSR primers were used to amplify some part of rice genome in germplasms, the PIC values ranged from 0.07 (RM 340) to 0.71 (RM 7426) with an average of 0.45. The results showed a total number of 106...
متن کاملGenetic Diversity and Molecular Phylogeny of Iranian Sheep Based on Cytochrome b Gene Sequences
Phylogenetic relationships and genetic variation between two Iranian sheep breeds were analyzed using cytochrome b (cyt-b) gene sequences. The genomic DNA was isolated by salting out method and amplified cytochrome b gene using polymerase chain reaction restriction (PCR) method with a pair of primer. A partial sequence of cyt-b gene of Iranian sheep is 780 bp and contained 13 variable sites and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Informatica, Lith. Acad. Sci.
دوره 21 شماره
صفحات -
تاریخ انتشار 2010